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(57) Abstract 



In compression encoding of a digital signal, such as MPEG2, transfomi coefficients are quantised with the lower bound of each 
interval being controlled by a parameter A. In the MPEG2 reference coder, for example, A«0.75. Because the quantised coefficients are 
variable length coded, improved quality or reduced bit rates can be achieved by controlling A so as to vary dynamically the bound of each 
interval with respect to die associated representation level. The parameter A can vary with coefficient amplitude, with frequency, with 
quantisation step size. In a transcoding operation. A can also vary with parameters in the initial coding operation. 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the firont pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovwiia 


AM 


Armenia 


n 


I^nland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


SZ 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


M6 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Ttotancnistan 


BP 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Ttalccy 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


brael 


MR 


Mauritania 


VG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Cfcntial African Republic 


JP 


Sspaa 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Nedterbuids 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyigyzstsn 


NO 


Norway 


ZW 


Zimbabwe 


a 


Cdte dlvoiie 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Rqrablic of Koita 


PL 


Poland 






CN 


China 


KR 


Rqmblic of Korea 


PT 


Pcxtugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian FedeiatioD 






DE 


Gennany 


U 




SD 


Sudan 






DK 




LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







wo 98/38800 



PCT/GB98/00582 



DIGITAL SIGNAL COMPRESSION ENCODING WITH IMPROVED QUANTISATION 

This invention relates to the compression of digital video, audio or 
other signals. 

Compression encoding generally involves a number of separate 
techniques. These will usually include a transformation, such as the block- 
5 based discrete cosine transform (DCT) of MPEG-2; an optional prediction 
step; a quantisation step and variable length coding. This invention is 
particularly concerned in this context with quantisation. 

The quantisation step maps a range of original amplitudes onto the 
same representation level. The quantisation profcess is therefore irreversible. 
10 MPEG-2. (in common with other compression standards such as MPEG-1 , 
JPEG, CCITT/ITU-T Rec.H.261 and ITU-T Rec.H.263) defines representation 
levels and leaves undefined the manner in which the original amplitudes are 
mapped onto a given set of representation levels. 

In general ternis, a quantizer assigns to an input value, which may be 
15 continuous or may previously have been subjected to a quantisation process, 
a code usually selected from quantization levels immediately above and 
immediately below the input value. The error in such a quantization will 
generally be minimised if the quantization level closest to the input value is 
selected. In a compression system, it is further necessary to consider the 
20 efficiency with which respective quantization levels may be coded. In variable 
length coding, the quantization levels which are employed most frequently are 
assigned the shortest codes. 

Typically, the zero level has the shortest code. A decision to assign a 
higher quantization level, on the basis that it is the closest, rather than a lower 
25 level (and especially the zero level) will therefore decrease coding efficiency. 
In MPEG2, the overall bit rate of the compressed signal is maintained 
beneath a pre-determined limit by increasing the separation of quantization 
levels in response to a tendency toward higher bit rgt^. Repeated decisions 
to assign quantization levels on the basis of which is closest, may through 
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coding inefficiency thus lead to a coarser quantization process. 

The behaviour of a quantizer in this respect may be characterised 
through a parameter X which is arithmetically combined vvith the input value, 
with one value of X (typically X = 1) representing the selection of the closest 
5 quantization level or "rounding". A different value of X (typically X = 0) will in 
contrast represent the automatic choice of the lower of the two neairest 
quantization levels, or "truncating". In the MPEG2 reference coder, an 
attempt is made to compromise between the nominal reduction in error which 
is the attribute of rounding and the tendency toward bit rate efficiency which is 
10 associated with truncaUng. by setting a standard value for X of X = 0.75. 

Whilst particular attention has here been paid to MPEG2 coding, 
similar considerations apply to other methods of compression encoding of a 
digital signal, which including the steps of conducting a transformation 
process to generate values and quantising the values through partitioning the 
1 5 amplitude range of a value into a set of adjacent intervals, whereby each 
interval is mapped onto a respective one of a set of representation levels 
which are to be variable length coded, such that a bound of each inten/al is 
controlled by a parameter X. The transfomnation process may take a large 
variety of forms, including blockrbased transforms such as the DCT of 
20 MPEG2. and sub-band coding. 

It is an object of one aspect of the present invention to provide an 
improvement in such a method which enables higher quality to be achieved at 
a given bitrate or a reduction in bitrate for a given level of quality. 

Accordingly, the present invention is in one aspect characterised in 
25 that X is controlled so as to vary dynamically the bound of each interval with 
respect to the associated representation level. 

Suitably, wherein each value is arithmetically combined with X. 
Advantageously, X is : 

a function of the quantity represented by the value; 
30 where the transfomnation is a DCT, a function of horizontal and 

vertical frequency; 
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a function of the quantisation step size; or 
a function of tlie amplitude of the value. 
In a particular form of the present invention, the digital signal to be 
encoded has been subjected to previous encoding and decoding processes 
5 and k is controlled as a function of a parameter in said previous encoding and 
decoding processes. 

In a further aspect, the present invention consists in a (q, X) quantiser 
operating on a set of transform coefficients Xk representative of respective 
frequency indices fk in which A, is dynamically controlled in dependence upon 

10 the values of Xk and fk. 

Advantageously, X is dynamically controlled to minimise a cost function 
D + \iH where D is a measure of the distortion introduced by the quantisation 
in the uncompressed domain and H is a measure of compressed bit rate. 

The invention will now be described by way of example with reference 
15 to the accompanying drawings, in which:- 

Figure 1 is a diagram illustrating the relationships between 
representation levels, decision levels and the value of X.; 

20 Figure 2 is a block diagram representation of the quantization process 

in the MPEG2 reference coder; 

Figure 3 is a block diagram representation of a simplified and improved 
quantization process; 

25 

Figure 4 is a block diagram representation of the core elements of 
Figure 3; 



30 



Figure 5 is a block diagram representation of a quantization process 
according to one aspect of the present invention; and 
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Figure 6 is a blocl< diagram representation of a quantization process 
according to a furttier aspect of the present invention. 

In the specifically mentioned compression standards, the original 
amplitude x results from a discrete cosine transform (DCT) and is thus related 

5 to a horizontal frequency index f/,or and a vertical frequency index fver- Whilst 
this approach is taken as an example in what follows, the invention is not 
restricted in this regard. 

In general, a quantiser describes a mapping from an original amplitude 
X of frequencies W and fver onto an amplitude y = Q(x). The mapping 

10 performed by the quantiser is fully detemnined by the set of representation 
levels {r<; and by the con-esponding decision levels {di} as illustrated in 
Figure 1 . All original amplitudes in the range d/ ^ x < dp^i) are mapped onto 
the same representation level y * Q(x) ~ n. As can be seen from Figure 1 , 
consecutive decision levels are related by the quantisation step size q: and 

15 for a given representation level r/, the corresponding decision level is 

d,., = d, + q 

calculated as: 

d, ^ r, - ^.q (2) 

2 0 The quantiser is fully specified by the quantisation step-size q and the 

parameter X for a given set of representation levels {rt}. Therefore, a 
quantiser that complies with equations (1) and (2) can be refered to as a 
(q,X) quantiser. 

Currently proposed quantisers, as described in the reference coders 
25 for the H.261, H.263, MPEG-1 and MPEG-2 standards, all apply a special 

type of (q, X) quantiser in that a fixed value of X is used: for example X = 0.75 
in the MPEG-2 reference coder or A. = 1.0 in the MPEG-1 reference coder for 
quantisation of intra-DCT-coefficients. 

According to one aspect of this invention, X is not constant but is a 
30 function that depends on the horizontal frequency index fhor, the vertical 
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frequency index fver. the quantisation step-size q and the amplitude x: 

^ = Mfkor^ f^r* (3) 

5 

Examples of ways in which the function may usefully be derived to improve 
picture quality in video compression at a given t)it-rate - or to reduce the 
required bit-rate at a given picture quality - will be set out below. 

The invention extends also to the case of transcoding when a first 

10 generation amplitude yi = Qi(x) is mapped onto a second generation 

amplitude y2 = Q2(yi) to further reduce the bit-rate from the first to the second 
generation without having access to the original amplitude x\ In this case the 
first generation quantiser Qt and the second generation quantiser Q2are 
described as a (qi, A.i)-type quantiser and a (q2, X^j-type quantiser, 

15 respectively. The second generation value is described as a function: 

^2 = ^2(fhor» fv€r» 9p » V l) (4) 

20 

The parameter X2,ref that appears in Eqn. (4) is applied in a reference 
(^2, X2,refj-type quantiser. This reference quantiser bypasses the first 
generation and directly maps an original amplitude x onto a second 
generation reference amplitude yi^ref- Qijefix). 

25 The functional relationship of Eqn. (4) can be used to minimise the 

error (y2 -ya^^f) or the error (y^-x). In the first case, the resulting second 
generation quantiser may be called a maximum a'pastehori (MAP) quantiser. 

In the second case, the resulting second generation quantiser may be called 
a mean squared en-or (MSE) quantiser. Examples of the second generation 

30 (q2s A.2,MAP>type and (qz^ X2,MS£^-type quantisers are given below. For a more 
detailed explanation of the theoretical background, reference is directed to 
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the paper "Transcoding of MPEG-2 intra frames" - Oliver Werner - IEEE 
Transactions on Image Processing 1 998, which vyill for ease of reference be 
referred to hereafter as "the Paper". A, copy of the Paper is appended to 
British patent application No. 9703831 from which the present application 
5 claims priority. 

The present invention refers specifically to quantization of 'intra' DOT 
coefficients in MPEG2 video coding but can be applied to non-intra 
- coefficients, to other video compression schemes and to compression of 
signals other than video. In MPEG2, the prior art is provided by what is 
10 . known as Test Model 5 (TM5). The quantization scheme of TM5 for positive 
intra coefficients is illustrated in Figure 2. 

In order to simplify the description, the above diagram will be replaced 
by Figure 3, which illustrates essentially the same quantizer except for small 
values of q, where it corrects an anomaly as described in the Paper. 
15 . In this quantizer, the incoming coefficients are first divided by quantizer 

weighting matrix values, W, which depend on the coefficient frequency but 
which are fixed across the picture, and then by a quantizer scale value q 
which can vary from one macroblock to the next but which is the same for all 
coefficient frequencies. Prior to the adder, the equivalent inverse quantizer 
20 reconstruction levels are simply the integers 0, 1 , 2 ... . A fixed number X/2 . 
is then added to the value and the result truncated. The significance of X is 
that a value of 0 makes the quantizer (of the value input to the adder) a 
simple truncation, while a value of 1 makes it a rounding operation. In TM5, 
the value of X is fixed at 0.75. 
2 5 Attention will hereafter be focused on the operation of the 'core' 

quantizer shown in Figure 4. 

In a class of MPEG-2 compatible quantisers for,intra frame coding, 
non-negative original dct-coefficients x ( or the same coefficents after division 
by weighting matrix values W) are mapped onto the representation levels as: 
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9 2 



The floor function ZaVextracts the integer part of the given argument a. 



5 



Negative values are mirrored: 



y = -Q(\x\) 



(6) 



15 



20 



The amplitude range of the quantisation step-size q in eq. (1) is 
standardised; q has to be transmitted as side information in every MPEG-2 bit 
stream. This does not hold for the parameter X in eq. (1). This parameter is 
not needed for reconstructing the dct-coefficients from the bit stream, and is 
therefore not transmitted. However, the X-value controls the mapping of the 
original dct-coefficients x onto the given set of representation levels 



According to eq. (1), the (positive) x-axis is partitioned by the decision 

levels 



Each X 6 [ d|, d i+i ) is mapped onto the representation level y = n. As 



The parameter X can be adjusted for each quantisation step-size q, 
resulting in a distortion rate optimised quantisation: the mean-squared-error 



is minimised under a bit rate constraint imposed on the coefficients y. In 
order to simplify the analysis, the first order source entropy 




a special case, the interval [0, di ) is mapped onto y = 0. 



D^E[(x.y)'] 



(9) 
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H= 1, -P,. logj Pi. (10) 

of the coefficients y instead of the MPEG-2 codeword table is taken to 
calculate the bit rate. It has been verified in the Paper that the entropy H can 
5 be used to derive a reliable estimate for the number of bits that result from the 
. MPEG-2 codeword table. In Eqn. (10), P, denotes the probability for the 
occurrence of the coefficient y=r/. 

The above constrained minimisation problem can be solved by. 
applying the Lagrange multiplier method, introducing the Lagrange 
10 multiplier ji. One then gets the basic equation to calculate the quantisation 
parameter A,: 

dD dH 

15 Note, that the solution for X that one obtains from Eqn. (1 1) depends 

on the value of |i. The value of \i is detemriined by the bit rate constraint 



H< Ho (12) 

20 

where Ho specifies the maximum allowed bit rate for encoding the 
coefficients y. In general, the amplitude range of the Lagrange multiplier is 
0 < |j < CO, In the special case of Hq oo, one obtains p 0. Conversely for 
Ho ^ 0, one obtains in general y oo. 

25 The Laplacian probability density function (pdf) is an appropriate model 

for describing the statistical distribution of the amplitudes of the original dct- 
coefficients. This model is now applied to evaluate analytically Eqn. (11). 
One then obtains a distortion-rate optimised quantiser characteristic by 
inserting the resulting value for X in eq. (5). 

30 Due to the symmetric quantiser characteristic for positive and negative 
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amplitudes in Eqns. (5) and (6), we. introduce a pdf p for describing the 
distribution of the absolute original amplitudes |4. The probability Po for the 
occurrence of the coefficient y~0 can then be specified as 

5 = J p(x)dx (13) 

0 

Similarly, the probability P, for the coefficient \y\ becomes 

{,...%.. 

Pi - J p(y<fc ; - 1,2, (14) 

(-4). 

With Eqns. (13) and (14), the partial derivative of the entropy H of eq. 
10 (10) can be written after a straightfonward calculation as 

From eq. (9) one can first deduce 



15 



D = \ 7? .p(x)dx + X \ (X- l.qf .p(x)dx (16) 

Q - . - 

and further from eq. (16) 



20 

It can be seen from eq. (17) that 



(17) 



— >0 ifO<,X<l 



(18) 
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Thus, when X is increased from zero to one. the resulting distortion D 
is monotonically decreasing until the minimunri value is reached for X = f . 
The latter is the solution to the unconstrained minimisation of the mean- 
5 squared-error, however, the resulting entropy H will in general not fulfil the bit 
rate constraint of eq. (12). 

Under the assumption of P/> P/+f in eq. (15), we see that <5H/^>0. 
Thus, there is a monotonia behaviour: when X is increased from zero to one, 
the resulting distortion D monotonically decreases, at the same time the 

10 resulting entropy H montonically increases. Immediately, an iterative 

algorithm can be derived from this monotonic behaviour. The parameter X is 
initially set to X = t, and the resulting entropy H is computed. If H is larger 
than the target bit rate Ho, the value of X is decreased in further iteration steps 
until the bit rate constraint. eq. (12). is fulfilled. While this iterative procedure 

15 forms the basis of a simplified distortion-rate method proposed for 
transcoding of Nframes, we continue to derive an analytical solution 
forX. 



After inserting the model pdf of Eqn. (19) in Eqns. (15) and (17), it can 
be shown that the basic equation (11) leads then to the analytical solution 
for X, 

25 



Eqns. (15) and (17) can be evaluated for the Laplacian model: 



20 




(19) 




(20) 



with 2 = e'**^ and the 'z'-entropy 
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h(z) = -2,l0g,Z - (I^Z), l0g//.z; 



(21) 



Eqn. (20) provides only an implicit solution for X, as the probability Pq 
on the right hand side depends on X according to eq. (13). In general, the 
5' value of Pq can be determined only for known X by applying the quantiser 
characteristic of Eqns. (5) and (6) and counting the relative frequency of the 
event y = 0. However, eq. (20) is a fixed-point equation for X which becomes 
more obvious if the right hand side is described by the function 



resulting in the classical fixed-point form X = q{X). Thus, it follows from the 
fixed point theorem of Stefan Banach that the solution for X.can be found by 
15 an iterative procedure with 



in the G + 1)-th iteration step. The iteration of (23) converges towards the 
20 solution for an arisitrary initial value Xo if the function g is 'self-contracting', i.e. 
Lipschitz-continuous with a Lipschitz-constant smaller than one. As an 
application of the mean theorem for the differential calculus, it is not difficult 
to prove that g is always 'self-contracting' if the absolute value of the partial 
derivative is less than one. This yields the convergence condition 



- 10 




(22) 



(23) 



25 



I 




(24) 



1 > 



2. 1x1^2; • q 
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A distortion-rate optimised quantisation method will now be derived 
based on the results obtained above. As an example, a technique is outlined 
for quantising the AC-coefficients of MPEG-2 intra frames. It is 
straightforward to modify this technique for quantising the dct-coefficients of 
5 MPEG-2 inter frames, i.e. P- and B-frames. 

Firstly, one has to take into account that the 63 AC-coefficierits of an 
8x8 dct-block do not share the same distribution. Thus, an individual 
Laplacian model pdf according to eq. (19) with parameter a\ is assigned to 
each AC-frequency index /. This results in an individual quantiser 
10 characteristic according to Eqns. (5) and (6) with parameter X/. Furthermore, 
the quantisation step-size q/ depends on the visual weight w/ and a 
frequency-independent qscale parameter as 



Wi . qscale 
^' = — ^ (25) 

15 



For a given step-size g,. the quantisation results in a distortion Di(Xi) 
and a bit rate Hi(X|) for the AC-coefficients of the same frequency index /. As 
the dct Is an orthogonal transform, and as the distortion is measured by the 
20 mean-squared-enror, the resulting distortion D in the spatial (sample/pixel) 
domain can be written as 



D = c. (x,) ' (26) 

25 

with some positive normalising constant c. Alternatively the distortion can 
measured in the weighted coefficient domain in order to compensate for the 
variation in the human visual response at different spatial frequencies. 
Similarly, the total bit rate H becomes 



30 
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H^Y^H> (X>) 



10 



20 



25 



(27) 



For a distortion rate optimised quantisation, the 63 parameters h have to be 
adjusted such that the cost function 

D + f^.H (28) 

is minimised. The non-negative Lagrange multiplier m is determined by the bit 
rate constraint 

H ^ Ho (29) 
Alternatively, if the distortion is expressed in the logarithmic domain as: 
D' = 20logioD dB (28a) 
1 5 The cost function to be minimised becomes: 

B = D + M'H (28b) 



30 form 



Where \x' is now an a priori constant linking distortion to bit rate. 

A theoretical argument based on coding white noise gives a law of • 
6 dB per bit per coefficient. In practice, observation of actual coding results 
at different bit rates gives a law of k dB per bit. where k takes values from 
about 5 to about 8 depending on the overall bit rate. In practice, the intuitive 
•6dB' law con^sponds well with obsen/ation. 

Additionally, the qscale parameter can be changed to meet the bit rate 
constraint of Eqn. (25). In principle, the visual weights Wi offer another 
degree of freedom but for simplicity we assume a fixed weighting matrix as in 
the MPEG-2 reference decoder. This results in the following distortion rate 
optimised quantisation technique which can be stated in a 'C'-language-like 
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/* Begin of quantising the AC-coefficients in MPEG-2 intra frames*/ 

for (qscale = qmin; qscale ^qmax; qscale = qscale + 2)1* linear qscale table*/ 
{ 

|j = 0: • 
do{ 

Step 1: determine X^M ^63 by minimising D + • H; 

Step 2: calculate H^ZHi (X/); 

M = JJ + 5; /"d to be selected appropriately*/ 
}while {H>Ho): 
Step 3: calculate D = c • Z D, (X/); 

if(0<Dm4,){ 

qscalBopt = qscale; 

for (/ = 1; 63; /= /+ 1) A./.opf= Xi; 
for(/= 1;/<63;/ = /+1) 

- ^ w," ■ qscale^ 

{ 

quantise all AC-coefficients of frequency-index / by 
} 

/*End of quantising the AC-coefficients in MPEG-2 intra frames*/ 
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There are several options for performing Step 1 - Step 3: 

1 . Options for performing Step 1 

The parameters Xi, X2, A.63 can be determined 

5 a) analytically by applying Eqns. (20)-(23) of Section 3. 

b) iteratively by dynamic programming of D + m • H, where either of the 
options described in the next points can be used to calculate D and H. 

2. Options for performing Step 2 

10 H = S H/ (X/) can be calculated 

(a) by applying the Laplacian model pdf, resulting in 

. ff -UhCPojJ + O - Poj).^ (32) 

■ 15 ' 

where h(PoJ) and h(Z) are the entropies as defined in eq. (21) of Po, (eq. 
(13)) and Z, = e'°, respectively. Note that Poj in Eqn.(32) can be 
determined by counting for each dct-frequency index / the relative frequency 
20 of the zero-amplitude y - Q,(x) - 0. Interestingly, eq. (32) shows that the 
impact of the quantisation parameters X, on the resulting bit rate H only 
consists in controlling the zero-amplitude probabilities Poj. 

b) from a histogram of the original dct-coefficients, resulting with Eqns. 
25 (10). (13) and (14) in 

^= -Z Z Pu- log,i>„ (33) 



c) by applying the MPEG-2 codeword table 
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3. Options for performing Step 3 

D'cUDi (X) can be calculated 

5 

a) by applying the Laplacian model pdf of Eqn, (19) and evaluating 
' Eqn. (16).. 

10 b) by calculating D = £ f fx - yf\ directly from a histogram of the original 
dct-coefficients x. . 

Depending on which options are chosen for Step 1 - Step 3, the 
proposed method results in a single pass encoding scheme if the Laplacian 

15 model pdf is chosen or in a multi pass scheme if the MPEG-2 codeword table 
is chosen. Furthermore, the method can be applied on a frame, macroblock 
or on a 8x8-block basis, and the options can be chosen appropriately. The 
latter is of particular interest for any rate control scheme that sets the target 
bit rate Hq either locally on a macroblock basis or globally on a frame basis. 

20 Furthermore, we note that the proposed method skips automatically 

high-frequency dct-coefficients if this is the best option in the rate-distortion 
sense. This is indicated if the final quantisation parameter h^opt has a value 
close to one for low-frequency indices / but a small value, e.g. zero, for high- 
frequency indices. 

25 A distortion-rate optimised quantisation method for MPEG-2 

compatible coding has been described, with several options for an 
implementation. The invention can immediately be applied to standalone 
(first generation) coding. In particular, the results help designing a 
sophisticated rate control scheme. 

30 The quantiser characteristic of eqs. (5) and (6) can be generalised to 

y = Q(x) = r{x) + 1—7^ + -J^J. 9(3c; (34) 
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for non-negative amplitudes x. The floor^unction LaJ in eq. (34) returns the 
integer part of the argument a. Negative amplitudes are mirrored, 



y-= -QM (35) 

The generalisation is reflected by the amplitude dependent values 
^ (x). q(x), r(x) in eq. (34). For a given set of representation levels 
10 ... < ri.| < n < n+i < ... and a given amplitude x, the pair of consecutive, 
representation levels is selected that fulfils 

/•/./ <x<r, (36) 

1 5 The value of the local representation level is then set to 

r(x) = r,., (37) 

The value of the local quantisation step-size results from 

20 . 

q(x) = q, = r, - r,., (38) 

A straightfonward extension of the rate-distortion concept detailed 
above yields for the local lambda parameter, very similar to eq. (20) , 

25 



(39) 



0 = 1 L) 

Similar to eqs. (13). (14). the probabilities in eq. (39) depend on the 
lambda parameters, 

30 
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and 




5 Therefore, eq. (39) represents a system of non-linear equations for 

detemnining the lambda parameters Xu In general, this system can 

only be solved numerically. 

However, eq. (39) can be simplified if the term log2(PM/Pi) 
is interpreted as the difference 

10 // - 4; = log.(-9f) (42) 

of optimum codeword lengths 

= Aog^p, = -log,?M (43) 

15 

associated with the representation levels n, Pm. 

A practical implementation of the above will now be described. 
Once the probability distribution, parametric or actual, of the 
unquantized coefficients is known, it is possible to choose a set of quantizer 

20 decision levels that wHI minimise the cost function B, because both the 

entropy H and the distortion D are known as functions of the decision levels 
for a given probability distribution. This minimization can be performed off- 
line and the calculated sets of decision levels stored for each of a set of 
probability distributions. 

25 In general, it will be seen that the optimum value of X corresponding to 

each decision level is different for different coefficient amplitudes. In practice. 
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it appears that the greatest variation in the optimum value of X with amplitude 
is apparent betwisen the innermost quantizer level (the one whose 
reconstruction level is 0) and all the other levels. This nrieans that it may be 
sufficient in some cases to calculate, for each coefficient index and for each 
5 value (suitably quantized) of the probability distribution parameter, two values 
ofX. one for the innemriost quantizer level and one for all the others. 

A practical approach following the above description is shown in 
Figure 5. 

The DCT coefficients are taken to a linear quantizer 52 providing the 

10 input to a histogram building unit 54. The histogram is thus based on linearly 
quantized versions of the input DCT coefficients. The level spacing of that 
linear quantizer 52 is not critical but should probably be about the same as 
the average value of q. The extent of the histogram function required 
depends on the complexity of the parametric representation of the pdf; in the 

15 case of a Laplacian or Gaussian distribution it may be sufficient to calculate 
the mean or variance of the coefficients, while in the 'zero excluded' Laplacian 
used in the Paper it is sufficient to calculate the mean and the proportion of 
zero values. This histogram, which may be built up over a picture period or 
longer, is used in block 56 as the basis of an estimate of the pdf parameter or 

20 parameters, providing one of the inputs to the calculation of X in block 58. 

Another input to the calculation of X is from a set of comparators 60 
which are in effect a coarse quantizer, determining in which range of values 
the coefficient to be quantized falls. In the most likely case described above, 
it is sufficient to compare the value with the innermost non-zero 

2 5 reconstruction level. The final input required to calculate X is the quantizer 
scale. 

In general, an analytical equation for X cannot be obtained. Instead, a 
set of values can be calculated numerically for various combinations of pdf 
parameters, comparator outputs and quantizer scale values, and the results 
30 stored in a lookup table. Such a table need not be very large (it may, for 
example, contain fewer than 1000 values) because the optima are 



wo 98/38800 



- 20 - 



PCT/GB98/00582 



not very sharp. 

The value of X calculated is then divided by 2 and added in adder 62 to 
the coefficient prior to the final truncation operation in block 64. 

Instead of using variable codeword lengths that depend on the current 
5 probabilities according to eq, (43). a fixed table of variable codeword lengths 
Co, Cl can be applied to simplify the process. The values of Co,..., Cl can 
be determined in advance by designing a single variable length code, ie. a 
Huffman code, for a set of training signals and bit rates. In principle, they can 
also be obtained directly from the MPEG2 variable-length code table. The 
10 only complication is the fact that MPEG2 variable-length coding is based on 
combinations of runs of zero coefficients terminated by non-zero coefficients. 

One solution to this problem is to estimate 'equivalent codeword 
lengths* from the MPEG2 VLC tables. This can be done quite easily if one 
makes the assumption that the probability distributions of the DCT 
15 coefficients are independent of each other. Another possibility is to consider 
the recent past history of quantization within the cun-ent DCT block to 
estimate the likely effect of each of the two possible quantization levels on the 
overall coding cost. 

Then, eq. (39) changes to 

20 

Mx) - =^ I - -^{Ci -Cm; M = 1 L) ' (44) 

The resulting distortion-rate optimised quantisation algorithm is 
essentially the same as detailed previously except that the lambda 
parameters are calculated either from eq. (39) or eq. (44) for each pair of 
25 horizontal and vertical frequency indices. 

A simplified method of calculating X{x) will nov\< be described, where 
only the local distortion is considered for each coefficient. 

Here, we make use of the fact that the variable-length code (VLC) 
table used for a given picture in MPEG2 is fixed and known. This should 
30 simplify and make more accurate the calculations of the trade-off between bit 
rate and distortion. In particular, the calculations can be made on a 
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coefficient basis since the effect on the bit rate of4he options for quantizing a 
particular coefficient is immediately known. The same is true (although a little 
more difficult to justify) of the effect on the quantizing distortion. 

If we accept the assumptions implied in the above paragraph, then we 
5 can very simply calculate the value of the decision level to minimize the local 
contribution to the cost function B. This will in fact be the level at which the 
reduction in the bit count obtained by quantizing to the lower reconstmction 
level (rather than the higher level) is offset exactly by the corresponding 
increase in quantizing distortion. 
10 If the two reconstruction levels being considered have indices i and 

i + 1,the corresponding codewords have lengths Li, and L^i, and the 
quantizer scale is q, then: 

(i) the reduction in bit count is Lm - L|. 

(ii) the local increase in distortion is 20 logioq(1 - 7J2) - 20 logioq3l/2. 
15 . 

Combining these using the law linking distortion to bit rate, we have 

6(Ui - Li) = 20 logio(2/X - 1) (45) 

20 or, more simply 

Ui.Lj = log2(2/X.1) (46) 

leading to 

25 A = 2/(1 + 2^^^'-W) ^47j 

This elegant result shows that the value of X depends here only on the 
difference in bit count between the higher and lower quantizer reconstruction 
levels. 

30 The fact that the level of k is now independent both of the coefficient 

probability distribution and the quantizer scale leads to the following, much 
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simplified implementation shown in Figure 6. 

Here, the DCT coefficients are passed to the side-chain truncate 
block 70 before serving as the address in a coding cost tookup table 72. The 
value of lambda/2 is provided to adder 76 by block 74 and the output is 
5 truncated in truncate block 78. 

There have been described a considerable number of ways in which 
the present invention may be employed to improve quantisation in a coder; 
still others will be evident to the skilled reader. It should be understood that 
the invention is also applicable to transcoding and switching. 

10 The question will now be addressed of a two stage-quantiser. This 

problem is addressed in detail in the Paper which sets out the theory of so- 
called maximum a-posteriori (MAP) and the mean squared error (MSE) 
quantisers. By way of further exemplification there will now be described an 
implementation of the MAP and MSE quantiser for transcoding of MPEG2 

15 [MPEG2] intra AC-coefficients that result from an 8x8 discrete cosine 
transform (dct). 

The class of the first generation quantisers yi = Qi(x) specified by 
these equations is spanned by the quantisation step-size qi and the 
parameter Xi; such a quantiser is called (qi, Xi)-type quantiser. 

20 In the transcoder, the* first generation coefficients yi are mapped onto 

the second generation coefficients ya = Q2(yi) to further reduce the bit rate. 
Under the assumption of a (qi. Xi)-type quantiser in the first generation, eg. 
MPEG2 reference coder TM5 , it follows from the results set out in the Paper 
that the MAP quantiser Q2,map and the MSE quantiser Q2,mse can be 

25 implemented as a (qa, A,2.map)-type and a (q2, X2,mse)-type quantiser, 
respectively. For both, the MAP and the MSE quantiser, the second 
generation step-size q2 is calculated from the second generation parameters 
W2 and qscalea. However, there are different equations for calculating A.2.map 
and Xa^se. 



wo 98/38800 



- 23 - 



PCT/GB98/00S82 



15 



With the results of the Paper, it follows that A.2.map can be calculated as 



^2.n»p = Ai.^ + {M^ " '^') • ^ (48) 



5 and X2,inse as 



^2.™ = 1 + - ■ ^ (49) 



The parameter X2.ref can be changed in the range 0 <, A.2,rops 1 for 
10 adjusting the bit rate and the resulting signal-to-noise-ratio. This gives an 
additional degree for freedom for the MAP quantiser compared with the MSE 
quantiser. The value of = 0.9 is particularly preferred. The parameter 
Pmap and the parameter pmse are calculated from the first generation 
quantisation step-size qi and a z-value, 



-2 1 - (l - In ( z"' ) ) . z"' 
'inCz'')- I - z"' (5^^ 



20 

The amplitude range of the values that result from these equations can 
be limited to the range 0 ^ pmap, Mmse^ 2. Similarly, the amplitude range of 
the resulting values can be limited to 0 < X2,map, A.2.mse ^ 2. 

The z-value has a nonnalised amplitude range, ie. 0 < z < 1 , and can 
25 be calculated either from the first generation dot-coefficients y^ or from the 
original dct-coefficients x as described in the Paper. In the latter case, the z- 
value is transmitted as additional side infomnation, eg. user data, along with 
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the first generation bit stream so that no additional calculation of z is required 
In the transcoder. Alternatively, a default z-value may be used. An individual 
z-value is assigned to each pair of horizontal and vertical frequency indices. 
This results in 63 different z-values for the AC-coefficients of an 8x8 dct. As a 
5 consequence of the frequency dependent z-values, the parameters X2.n,ap and 
^2.mse are also frequency dependent, resulting in 63 (qa, X2,map)-type 
quantisers and 63 (q2, A.2.mse)-type quantisers, respectively. Additionally, 
there are diifferent parameter sets for the luminance arid the chrominance 
components. The default z-values for the luminance and chrominance 
10 components are shown in Table 1 and Table 2 respectively. 
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TABLE 1 

Normalised z-values, eg. 256 x z, for luminance (default) 
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0 
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250 
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232 
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231 
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237 


231 


226 
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222 
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226 


231 


7 


222 


211 


210 


205 


202 


208 


214 


222 



wo 98/38800 PCT/GB98/00S82 

- 26 - 



TABLE 2 

Normalised z-values. ie. 256 x z, for chrominance (default) 





0 


1 


2 


3 . 


4 


5 


6 


7 


0 




245 


242 


230 


212 


176 


158 


179 


1 


246 


240 


233 


219 


193 


154 


156 


177 


2 


239 


233 


224 


209 


180 


148 


154 


173 
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229 


221 


211 


196 


163 


141 


150 


166 


4 


219 


208 


198 


181 


153 


133 


143 


166 


5 


207 


193 


182 


171 


140 


126 


143 


161 


6 


193 


176 


162 


154 


127 


118 


137 


163 


7 


169 


145 


148 


129 


102 


108 


,27 


158 



For a description of preferred techniques for making available to 
subsequent coding and decoding processes, information relating to earlier 
coding and decoding processes, reference is directed to EP-A-0 765 576; 
EP-A-0 807 356 and WO-A-9803017. 
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Transcoding of MPEG-2 intra frames 
1. Introduction 

Transcoding is the key technique to hirther reduce the bit rate of a previously compressed im- 
age signal. In contrast to a standalone source encoder, a traiiscoder has only access to a previ- 
ously compressed signal that already contains quantisation noise when compared to the 
original source signal. Thus, the bit stream output of die transcoder is the result of cascaded 
coding with a so called first generation encoder in the first stage followed by the transcoder^ 
seeFig.l, 



s 

^ 


enc.-l 









s = source signaL enc.-l = first generation encoder, transc. = transcoder 
bx / b^ - first second generation bit stream 

Fig. 1: Cascaded coding as a result of first generation encoding and subsequent transcoding 

It is assumed throughout that the first generation bit stream b| that defines the input of the 
transcoder and the second generation bit stream b2 that represents the output are both MPEG- 
2 [MFEG-2] compliant. Hence, bj and b2 can be passed on to a MPEG-2 decoder. In MPEG-2 
motion compensating prediction (mcp) is combined with the discrete cosine transform (dct) in 
a hybrid coding algorithm [TM5-93]. In general both elements, nicp and dct coding, can be ex- 
ploited for efficient transcoding. Compared to inter frames, i.e. F- and B-frames, mcp is 
switched off in intra frames (I-frames). Therefore, only dct coding can be exploited to transcode 
I-frames. In this paper we concentrate on MPEG-2 compatible transcoding of I-frames. A gen- 
eralized block diagram of a MPEG-2 1-frame decoder is given in Fig. 2 
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bl(2)^ 


vld 


yi(2) 









Sl(2) 


idct 





vld = variable length decoder, idct s invezse discrete cosine transform 

bi(2) = first (second) generation bit stream, yip) = decoded dct-coe£f. of first (second) generation 
Sx(2) = reconstructed image signals of first (second) generation 

Fig.2 : Generalized MPEG-2 1-frame decoder 

The corresponding first generation encoder and transcoder are detailed in Fig.3 and Fig.4/ re- 
spectively. 



dct 



X 


Qi 


yi 


vie 






► 


► 



dct = discrete cosine transform, = first generation quantiser, vie = variable length encoder 

s = original source signal, x = original dct-coef f., yi = dct<oeff. of first generation 
b^ s first generation bit stream 

Fig. 3 : Generalized MFEG-2 compatible first generation I-frame encoder 





vld 


yi 


02 


72 


. vie 


>- 


_ 


► 


► 





vld = variable length decoder, Q2 = second generation quantiser, vie = variable length encoder 

bl = first generation bit stream, = dct-coeff. of first generation 
b2 = second generation bit stream, = dct-coeff. of second generation 

Fig. 4 : Generalized MPEG-2 compatible I-frame transcoder 



As can be seen from Fig.4 theelement of the transcoder to further reduce the bit rate is the sec- 
ond generation quantiser Q2 . Hence, the fundamental issue of transcoding is the design of Q2. 
To ttxe au Aor's knowledge, only a few publications have previously addressed this problem. 
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In [BT-94] and [Samo£f-96] it is suggested to implement in Q2 essentially the same quantiser 
characteristic that is used for Qi in the first generation encoder, Hg.3. In this case and Q2 
have the same shape, e.g. a uniform quantiser characteristic, and differ only in their level of 
coarseness, Le. the representation levels of Q2 are more widely spaced compared to the repre- 
sentation levels of Q|. Another approach to specify Q2 is described in [Columbia-95]. Each 
DCT-coeffident y^ of \he first generation is checked, and a decision is made whether to retain 
or skip it, i.e. y2 = Q2(yi)/ where either Q2(yi) = yi or Q2(yi) = 0 holds. The decision whether to 
retain or skip is determined in a iterative optimisation procedure on a frame basis. Experimen- 
tal results given in [Colunvbia-95] indicate that, to a large extent, high frequency dct-coefficents 
are skipped and low frequency dct-coefficients are retained. Therefore, this type of quantisa- 
tion results in dct-based low pass filtering, which in general carries the risk of introducing vis- 
ible block artefacts. Whilst this approach may be considered for transcoding for small bit rate 
changes between the first and the second generation, it appears questionable whether skipping 
of dct-coeffidents functions well in general Unfortunately, experimental results in [Columbia- 
95] are only given for transcoding between 4 and 3 Mbits/s, but not e.g. between 9 and 3 Mbit/ 
s. In (Samoff-96} the authors show in their experimental results that, even for transcoding be- 
tween 4 and 3 Mbit/s, skipping of dct<oeffidents is iivferior compared with the above re-quan- 
tisation where Qx and Q2 have the same shape and differ only in their level of coarseness. 
However, different algorithms for skipping of dct-coeffidents have beeti used in [5amoff-96] 
and in [Columbia-95]. 

This paper provides a theoretical analysis of the transcoding problem. The formal description 
of the second generation quantiser Q2 indudes the suggestions mentioned above of [BT- 
94] [Columbia-95 J(Samoff-96] as spedal cases. From the results of the analysis, we derive the- 
oretically the optimum quantiser characteristic Q2 for botii the mean-squared-error (mse) cost 
function and a so-called maximiun-a-posteriori (map) cost function. The difference between 
these two cost functions is explained, along with pointing out in which case each cost function 
is more suitable. In order to effidently apply the optimum mse- and map-quantiser character- 
istics in a transcoder, it is necessary to model the statistics of the original dct-coeffidents x, see 
Fig,3. This paper proposes a parametric model. This model is first validated with real image 
data before it is used in the experiments to evaluate the mse- and map-quantiser characteris- 
tics. For reference, the results are compared with the performance of the quantiser characteris- 
tic of MPEG-2 test model TM5 fTM5-931. 

The rest of the paper is organised as follows. In section 2, a formal desription of a MPEG-2 com- 
patible Quantiser is introduced. Two examples suitable for the quantiser of a first generation 
encoder are discussed and compared. Section 3 focusses on the second generation quantiser Q2 
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used in the transcoder. The problem of designing Q2 ^ analysed by making a comparison to a 
reference quantiser . The reference quantiser is used in a standalone encoder that by- 
passes the first generation stage and directly compresses the original signal to the desired bit 
rate of tfie second generation. In Section i, two methods of designing Q2 are proposed, ex- 
plained, and compared. The first method minimises the mse cost function, the second minimis- 
es the map related costs. In Section 5 a parametric model to describe the statistics of the original 
dct-coefficients is introduced and validated witti real image data. Based on ttus parametric 
model an analytical evaluation for both the mse and die map cost function is carried out in 
section 6. Experimental r^ults are discussed in section 7. Finally, conclusions and suggestions 
for future work are given in section 8. . 

2. MPEG-2 compatible quantisation in a first generation encoder 

An MFEG-2 1-frame is partitioned into blocks of 8x8 samples of the original signal s. As shown 
in Fig.3, each block is submitted to the dct, resulting in 64 dct-coeffidents x,- . The DC-coeffi- 
cient xq is separately quantised and encoded. As the DC-quantiser characteristic is fixed for 
each frame, we concentrate on the AC-coeffidents , f=l,2,...63, with amplitude range 
< 1024 . Without loss of generality, the frequency index i can be fixed for the following dis- 
cussion. For darity the frequency index i is therefore omitted. The original dct-coeffident x is 
passed to the first generation quantiser Qj. The MPEG-2 standard [MPEGt2] spedfies in its 
normative part the set of representation levels yi = Qi(x); in a MFEG-2 decoder, see Hg.2, a rep- 
resentation level yi is reconstructed as 

= /j , (1) 

with the quantisation step-size 

• qscale^ 

The amplitude levd li can take an integer value out of the allowed amplitude range ^ 2047 , 
and is transmitted as (8x8)-block data in the first generation bit stream bj; bj includes as addi- 
tional side information the values of Wj and qscale^ needed to calculate the quantisation step- 
size qi. The value of Wj can be set for each frame and depends on the fi-equency index, thus 
taking into account the firequency dependent properties of human visual perception. The value 
of qscalex does not depend on the frequency index and can be changed on a macroblock basis 
within each frame. A macroblock consists of four luminance and two co-sited chrominance 
blocks, each of 8x8 samples. A MPEG-2 compatible quantiser complies with the reconstruction 
rules of (1) and (2). Additionally, yi has to be rounded to an integer which is omitted in our 
discussion to ease the notation. For a given step-size there is no imique MPEG-2 compatible 



SUBSTITUTE SHEET (RULE 26) 



wo 98/38800 



PCT/GB98/00582 



31 

quantiser yi = Qi(x) because of the remaining degree of freedom how to map the set of original 
samples x onto the given set of representation levels. 

As an example, the quantiser characteristic can be specified for non-negative values x as 



(3) 



where the floor function J extracts the integer part of the given argument a. The quantiser 
characteristic of (3) can be mirrored for negative values of x. 

The parameter in eq. (3) determines how the positive x-axis is partitioned into half-open in- 
tervals J J ). Every interval is defined by two consecutive decision levels rf^^ with 
d^^ = 0 for / = 0 and 



(5) 



for / > 1 . According to (3), each [rfj^ ^ | j) is mapped onto the same representation 



level r 



1/ 



see Fig. 5. 



^1/ = '-^1 



(6) 



^1 



hi 



''l(/ + l) ''l(/+l) ''l(/ + 2) 



Fig. 5: Decision and representation levels of the quantiser in eq. (3) 

The value of Xj can be tuned to the cost function that is used to measure the quantiser per- 
formance. Here, two types of performance are considered. 

(i) mse performance 

The mean-squared-error (mse) is a familiar cost function to measure the quantiser perform- 
ance. In this case the value of is determined by minimising the expectation value 



(7) 
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In order to minimise (7), each orginal dct-coeffident x has to be mapped onto the nearest rep- 
resentation level Yx- r^^ . Thus, without applying the calculus for differentiation one can con- 
clude: the corresponding decision levels d^^ that minimise (7) are defined by the arithmetic 
mean values 

^„ (8) 

for the given set of representation levels r ^ ^ . The result in eq. (8) can be regarded as the first 
half of the celebrated Uoyd-Max quantiser design rule described in [Lloyd-57] and [Max-60] 
which in its second half requires each representation level to coincide with the local cen- 
troid of the corresponding bin [^[/> ^|(/ 4. j)) - However, the latter can in general not be 
achieved for the signal-independent representation levels of eq. (6) because the local centroids 
depend on the probability distribution of the original dct-coeffidents x, and are therefore sig- 
nal-dependent. With eqs. (6) and (8) it follows from eq. (5) that = 1 is the solution for die 
mse cost functioru 

(ii) rate-distortion performance 

In a rate-distortion sense it is more suitable to minimise the mse term of eq, (7) subject to a giv- 
en bit rate constraint for the first generation bit stream b^. In this case the bit rate needed to 
encode the first generation dct-coefficients yi is not allowed to exceed a preset value H. Let 
// J ( X. ^ ) denote the resulting bit rate for an adjustment of the decision levels d ^ ^ according to 
eq. (5). It then follows from the method of Lagrange multipliers [Heuser-82] that the optimiun 
value of can be found by minimising the extended cost function 

£[(x-yj)^] + ji//l{Xj). (9) 
The Lagrange multiplier (i in eq. (9) is determined by the constraint 

//j(Xj)<//. (10) 

There are two extreme cases. For f/ -> <» the Lagrange multiplier aproaches zero, H 0, and 
eq. (9) coincides with eq. (7), i.e. no attention is paid to the resulting bit rate and only the mse 
is minimised. For // -> 0 no dct-coeffcients yi can be coded. As a consequence, the decision 
level £f| ^ already approaches infinity for / = 1 , i.e. j ^ <« so that all original coefficients x 
are quantised to zero. If the latter case with d^^-^00 is only applied to particular blocks or to 
a selected range of frequency indices, e.g. all high frequency dct-coefficients, then, the optimi- 
sation of eqs. (9) and (10) results in skipping of dct-coefficients. 
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The first twenty-five firames of the COR 601 test signal 'mobile' have been coded to evaluate 
both the mse and the rate-distortion performance of the quantiser specified in eqs. (3) and (4). 
Two values of have been used: a.) Xj = 1 andb.) Xj = 0.75 . The weighting matrix pro- 
posed in MPEG-2 test model TM5 [TM5-93] has been applied for a frequency dependent quan- 
tisation. Fig. 6 shows the resulting peak-signal-to-noise-ratio (PSNR) as a function of qscalei , 
see eq.'(2) for the meaning of qscalex- 



PSNR[dB] 




Fig. 6: PSNR/mse performance of the quantiser of eqs. (3) and (4) for the test signal 'mobile' 

In accordance with the theoretical results the largest PSNR values are achieved for Xj = 1 . 
The PSNR values drop by about 0.4 dB if the value is changed to Xj = 0.75 . However, 
Xj ss 0.75 is the better choice in terms of rate-distortion performance for small and medium 
bit rates as is revealed in Fig. 7. For a given distortion of e.g. PSNR = 30 dB the resulting bit 
amoimt is approx. 450000 bit/frame in case of X J = 1 and approx. 410000 bit/frame in case of 
X| = 0.75 , this results in a bit saving of approx. 9% in the latter case. Conversely, for a given 
bit amoimt of 450000 bit/ frame the PSNR value can be increased by about 0.5 dB from 30 dB 
to 305 dB when changing from X^ = 1 to Xj = 0.75 . As explained above, when the bit rate is 
infinite, again the rate-distortion performance of X j = 1 is best. Hence, one can expect an in- 
tersection between the rate-distortion curves of Fig. 7 when the bit rate is increased. This is 
shown in Fig. 8. For bit rates larger than approx. 1 Megabit/frame which corresponds to PSNR 
values larger than 38 dB the results are in favour of Xj = 1 . However, the difference to the re- 
sult of X, = 0.75 is rather small. 
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^ PSm [dB] 



Fig. 8: Rate-distortion performance in case of high bit rates for 
the quantiser of eqs. (3) and (4), test signal 'mobile' 
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The value A.^ ^ 0.75 is the 'intendedf value for the quantiser of the MPEG-2 test model TM5. 
Due to a simplified calculation of eqs. (3) and (4) involving integer rounding operations, the 
value of X| that is used in TM5 depends on the value of qscale^, for further details the reader 
is refered to the TM5 description in [TM5-93]. The functional relationship between and 
qscalei of TM5 is shown in Fig. 9! 




0.65 I— i — I ■ ■ — I — I I ■ I I I ■ I I « — ■ — ■ I ■ — — ' ' ■ ■ ' ^ , 
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 qscalci 

Fig. 9: MPEG-2 test model TM5, functional relationship between Xi and qscalei 

For qscalei = 2 the resulting lambda value is = 1 , this is fine because a small qscalei value 
corresponds to a high bit rate, and then = I is the best choice. With increased qscalei the 
Xj values approach (tie 'intended' value 0.75. This value is exactly matched, i.e. X^ = 0.75 , if 
qscalei is a multiple of 4, e.g. qscalei 4/842,16,... 

-Of course, the experimental results give no evidence that X ^ = 0.75 is the optimum value in 
the rate-distortion sense of eqs. (9) and (10). The tuning of X^ , or more generally, the adjust- 
ment of the decision levels for a given set of representation levels to improve the rate-distortion 
performance of the first generation quantiser is another issue and beyond this paper's scope. 
In the following sections we concentrate on the second generation quantiser and investigate 
how the degree of freedom that lies in the adjustment of the decision levels can be exploited 
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for efficient transcoding. 

3, MPEG-2 compatible quantisation in a transcoder - the transcoding problem 

In contrast to the first generation quantiser Qi tiiat has access to the original dct-coeffidents x, 
the second generation quantiser Q2 used in the transcoder has only access to the first genera- 
tion dct-coeffidents yi- Thus, Q2 maps the first generation dct-coeffidents onto the second gen- 
eration, 

^2 = 22^^1>v 

With 

yi-e^(x) (12) 

for the first generation quantiser, the relationdup between the original dct-coeffidents and the 
second generation becomes 

)'2 = e2(eiU)). (13) 

Ideally, the result of cascaded quantisation in eq. (13) shotdd be identical to the output of a ref- 
erence quantiser Q2,ref access to the original dct-coeffidents. 

The reference quantiser is used in a standalone encoder that by-passes the first generation 

stage and directiy compresses the original signal to the desired bit rate of the second genera-r 

tion. For y-j = y . there is no additional loss due to transcoding. 
^ 2, ref 

Therefore, we start analysing the transcoding problem witii ttie investigation for which cases 

= y is achievable. As an example. Fig. 10 shows the quantiser characteristics of the 
^ Xref 

first generation and of the reference for transcoding. In this example, the two basic cases of 
transcoding occur. 
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Case 1: In the first case the half-open interval ['^ i /» (/ + i ) ) onihe x-axis is considered. Each 
xe /» ^1 (/ + 1 ) ) is mapped by the first generation quantiser onto the representation level 
yi =Q^W = r J ^. As a consequence, no matter on which second generation value thisrep- 
resentation level is mapped by the transcoder's characteristic ^2 ~ 22(^1 = ''i/)/'^^ resulting 
characteristic of eq. (13) will always be constant over the entire interval f ^ 1 ^ 1 ( / + 1 ) ) • How 
ever, the reference quantiser of eq. (14) changes the representation level over the interval 
f^l/'^l(/+ 1)^ • Each ^^\i*^L ref^ ^ mapped onto the representation level 
^2,re/ = 22,r^/''^°''2(L-iy^^"^* I^L. r^/' ^1(/+ 1)^ is mapped onto r^^ . Tlie 
dilemma for ttie transcoder is that only the entire interval I^i/»^i(/+i)) can be mapped ei- 
ttier on _ j j or on rj^^ . Clearly, no matter how the final choice is done. ^2 ~ ^ 
adiievableforallxe [rfj^i/j^^^jj). 

Case 2: In the second case the interval t^i (/ ^ 1 ). 1 (/ + 2) ) *e x-axis is considered. In tius 
case the reference quantiser maps the entire interval Id^^^^ jy ^l(/ + 2)^ representa- 
tion level • Hence, ^2 = 3^2 ref ^ ^^^^^^^^ for all x e 1 (/ + i )» ^ 1 ( / + 2) ^ transcod- 
er applies the mapping 



= >'2 = 22(3^1 ^'■l (/+!)) 



(15) 
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''2(1.-1) 



Fig. 10: Quantiser characteristics of first generation and of the reference for transcoding 
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Case 1 must be avoided in order to fulfil y^^y for the whole x-axis. This is accomplished 
if the set of decision levels {dj^ ^^^} of the reference quantiser forms a subset of the set of de-. 
cision levels |^ i i| of the first generation quantiser. 

For the parametric description of the decision levels in eq. (5), condition (16) can be translated 
into an equivalent condition that involves the quantisation step-sizes q j and ^2 of fi^st and 
the second generation, respectively. 



!2 J"°;''^),..u2.3....i. m 

?1 • ^2,ref 
2 

Eq. (17) reads as follows. For given parameters X| and '^2,ref generation and the 

reference quantiser, respectively, ^2 = ref ^ ^^^^^^^^ whole x-axis if there exists 

a positive integer Hq such that the middle term of (17) results in another positive integer k 
which at the same time has to be equal to the ratio — of the step-sizes. 

As an example, in the case of the mse cost function for the first generation and the reference 
quantiser with A., = Xj,^^^ = 1, eq. (17) becomes ~ = (2 • no+ 1) = k. This means that 
the ratio — has to be equal to an odd valued integer, e.g. 3^ 5, 7, ... 

In general for arbitrary q ^ and ^2 / condition (17) cannot be met, and as a consequence, there 
is an additional loss due to transcoding. This loss can be described by Case 1. It can be derived 
from Fig. 10 that the mismatch between ^2 and >2^ ref ^ especially large if the decision level 
^^y. coincides with the centre of the interval t^i/. i)) • ^or the above example of 
Xj = ^2. re/ ~ ^ this is indicated by an even ratio ^2-^^! ~ 2.4,6,... 

However, Case 1 does not occur throughout if eq. (17) is not fulfilled. The decision intervals of 
the first generation quantiser can be partitioned into two classes, those who belong to Case 1, 
e.g. [d^ /' ^1 (/ + 1 )) ^ Fig- ^ose who belong to Case 2, e.g. l^i (/ + 1 )• ^1 (/ + 2)^ ^ 

Fig. 10. In general, the resulting two classes do not have equal number of assigned intervals. 
The percentage of intervals that belong to Case 1 depends on and ^2 well as on Xj and 
The percentage of Case 1 intervals decreases asymptotically to zero as tiie ratio — 

tends to infinity, compare Fig. 10. Indeed, — = 00 can be regarded as a special case of eq. (17) 

q^ 

for nQ,k^°^, thus. Case 2 intervals are guaranteed throughout. For > 0 this results in 
^2 = 00 which can be interpreted as skipping of dct-coefficients in the transcoder and as well 
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in the reference quantiser. However, skipping of dct-coeffidents may not be the default for a 
' sophisticated reference quantiser to achieve a good rate-distortion performance, e.gl a refer- 
ence quantiser defined by eqs. (3) and (4). Therefore, techiuques are required that minimise the 
additional loss due to transcoding in Case 1 intervals. This problem will be addressed in the 
following section. 

4. Design of the second generation quantiser Q2 used in the transcoder 

From the results of the previous section it follows that the mapping of eq. (15) should be ap- 
plied in the transcoder for Case 2 int^als. Therefore^ this section concentrates on Case 1 in- 
tervals. Resuming the discussion of Case 1, the transcoder has to take a binary decision for the 
interval f^i/. + j)) in Fig. 10. As the interval i)) is represented by the first 

generation level = Q^ix) = in the transcoder, the decision is to map = r^^ either 
onto the second generation representation level yi = ''2(L- 1) °^ ^2 " '^IL' 
sion can be taken based on the minimisation of a cost function. Here, two cost functions are 
considered. 

4.1 MSE cost function 

Similar to eq. (7) in section 3, the mean-squared-error (mse) cost function can be applied, re- 
sulting in the expectation yalue 

£[(x->2)^]. (18) 

Thus, the objective of this cost function is to minimise the mse between the original dct-coeffi- 
dents X and the second generation coefficients yi that follow from the transcoder characteristic 
y^ = Q2iyi)' For the mse cost function of eq. (18), the corresponding reference quantiser 
ref ^ defined by eqs. (3) and (4) with the parameter „^ = 1 and the second generation 
step-size ^2 • Hence, the decision level in Fig. 10 is related to ti\e representation levels 

by eq. (8), i.e. 

^2a-i)''^2L (19) 



As the transcoder has access to the first generation coefficients yj, this information can be ex- 
ploited to minimise the term of eq. (18). Therefore, the mse cost function is expanded by intro- 
ducing a conditional expectation value that depends on y|, 
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With eq. (20) the residting binary decision rule can be written for the Case 1 interval 
1^11*^1(^1^1)) ^ ^^S- ^0 ^ follows. 



f''2(L-l) 
>'2 = 22(>'i=^l/)= ^ 



(21) 



4^'-'-2a-i)> h]>4^'-^2L> h] 



The conditional expectation values in eq. (21) depend on tiie probability density function (pdf) 
p(xy oftheorigiiial dct-coe£ficientsx.Thepdfdeterinines the probability that a sin^e val- 
ue X falls in ti»e interval l<^ip<^m + i)) > ^ mapped onto the first generation level 



Pj^= J (22) 



With the definition of a locid centroid. 



J (x.p(x))iix 
c„ . ^llL.,; : (23) 



and the decision level of eq. (19), the decision rule of eq. (21) can be re-stated after a straight- 
- forward calculation as 



(24) 



In tiie case of / = re/ ^* "^i*^ decision between r2(^ _ i ) and r^^^ can 

be made. However, ttie decision might then be in favour of the lower level, i.e. '^2{L-\)'^^ 
cause in general lower amplitude levels correspond to MPEG-2 codewords of smaller length. 



SUBSTITUTE SHEET (RULE 26) 



wo 98/38800 



PCT/GB98/00582 



42 

4.1.1 Implementation of the mse cost function 

There are several ways to implement (24) in a transcoder. Here, we outline a straightforward 
implementation of the mse cost function as a quantiser characteristic >2 = ^2^^^ ^ ' ^^^^ 
first generation coefficient = r^^, the transcoder can compute the corresponding interval 
f^l /* ^1 (/ + 1 ) ^ on the X-axis, e.g. for the first generation quantiser of eqs. (3), (4) one obtains 

''i/ = ^ir(T-'^i)' (25) 

and 

=^i/ + ^r ^^^^ 
In passing we note that the first generation quantiser step-size q^ required in eqs. (25) and (26) 
has to be transmitted as side information in any MPEG-2 bit stream. However, the parameter 
is not specified in MPEG-2, and must therefore be additionally signalled, e.g. as user data in 
the first generation bit stream, see [MFEG-2] for tfie definition of user data. 

Having computed the interval l^ii»d^^^^ , tixe corresponding second generation repre 
sentation levels can be determined with the reference quantiser ref ' ^^S- 



(27) 



and 



''2L = 22,,.^/U = <^i(/+i)) = 



(28) 



with X~ f = I for the mse cost function. Witti eqs. (27) and (28), the corresponding decision 
level can be calculated firom eq. (19). 

The pdf is needed to calculate the local centroid c^^ according to eqs. (22) and (23). In 
order to save the amount of additional side information, the transcoder can apply a model de- 
scription for pi^x) , involving only a few parameters. A parametric model suitable to describe 
the statistics of the original dct-coeffidents x will be detailed and validated in ttie next section 
5. 

Now, all parameters are available, i.e. r^^^^ ^ j ,r^^, and c^^, to apply the decision 

rule of eq. (24). A more compact form of (24) in the sense of a quantiser characteristic can be 
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Stated as follows. 



--2 
92 2 



(29) 



Eq; (29) specifies the second generation coefficient >2 a fiinction of the local centroid Cj^. 
However, as we are interested rather in the transcoder's qtiantiser characteristic >2 = 22(y i ) ' 
the local centroid has to be related to the first generation coefficient >», . As in eq. (25), the local 
centroid c ^ ^ can be specified by a paranuter (i^ ^ tiiat relates the centroid to the decision level 
d 



(30) 



Qearly, for H^/ = 1 local centroid coincides wiA the centre of tiie decision interval 
(<f J ^ Jj^^ ^ J J ) of length 9 J . From eqs. (25) and (30) one deduces 



^l/ = 0'i = rip + 
After inserting (31) in (29) and by defining of 



(31) 



(32) 



(33) 



one obtains the desired result. 

Thus, the transcoder's quantiser characteristic of eq. (33) is essentially the same as the one of 
the first generation of eq. (3). However, as can be seen from (32), the parameter is in general 
not a constant and depends on the actual value of p.^^ which can change with varying y, . 
Thus, in contrast to the first generation parameter , the second generation parameter de- 
pends in general on the actual input value y j of the transcoder' s quantiser. 

Nevertheless, in special cases depending on the pdf p{x) , the parameter become a con- 

stant, e.g. for a uniform pdf, i.e. p{x) = const, it follows from eqs, (22), (23) and (30) that 
= 1 throughout, and as a consequence of eq. (32) ^2 becomes also a constant. Another 
special case with constant parameter ^2 occurs if p. = holds in eq. (32). In this case each 
first generation coefficient y j coincides with the corresponding local centroid c ^ ^ , see eq. (31), 
resulting in ^2 = 1 for the second generation. 



SUBSTITUTE SHEET (RULE 26) 



wo 98/38806 



PCT/GB98/00582 



44 

4.2 From the znse to the map cost function 

In addition to eq. (33), there is another iitstructive interpretation of tiie mse decision rule. We 
therefore have to re-state eq. (24) by substituting the local centroid c ^ ^ with the parameter ^ ^ 
according to eq. (30). After re-ordering one obtains 

"^L, refill ^Hl 

y = e2(yi = r,^) = f'^^-* // . . V (34) 

^1 2 



With Fig. lO/the term in eq. (34) can be interpreted as the a-posteriori probability 

^uni original dct-coeffident x falls in the interval [dj ^, dj^ ) given the first gener- 

ation coefficient y^ = r^^ in the case of a tmiform pdf p{x) = const over the interval 
t^l/«''l(/+l)> oflength(7j,i,e. 

This is a curious result. Although the pdf p{x) is non-uniform in general, eq. (34) suggests one 
computes P^^^i and compares this value with the threshold — . 



In general, the a-posteriori probability P^^^ depends on the pdf as follows, 

J p{x)dx 



(36) 



where the denominator ^ is defined as in eq. (22). The complementary a-posteriori proba- 
bility is given by 

^i;; = ^f-^ l'^Z,re/''^l(/-Hl)>|>l ='■1/1 = '-''map- 



For the special case of p(x) - const the a-posteriori probability P^^^ of eq. (36) becomes 
identical to P„„,- of eq. (35). Also, the parameters that are related to the local centroids then be- 
come ^^ ^ = 1 , throughout. Hence for this special case, the decision rule of eq. (34) can be re- 
stated with eqs. (36) and (37) as 
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^'*!I!f. (38) 



2L ^map ^ ''map 

In general for a non-uniform pdf, i.e. p{x) ^ const, the decision rule of eq. (38) does not coin- 
cide with the nnse decision rule of eq. (34). The decision in (38) is based on the maximiun a- 
posteriori (map) probability which is either P^^^ or P^^^ . Therefore, eq. (38) is called the 
map decision rule. 

Another important difference between the mse decision rule of eq. (34) and the map dedsion 
rule of eq. (38) is that the reference quantiser fij, ref ^ ^ ^ ^ 

trary in the map case. Note that Q2 determines ttie second generation dedsion level 
dj r in eqs. (36) and (37), and has therefore a significant impact on the a-posteriori proba- 
bilities. As an example, the map dedsion rule can be applied to the dass of reference quantisers 
^2 ref which is spanned by the parameter ^^y. In the spedal case of 

f = 1 ' one obtains the reference quantiser of the mse cost function; however as ex- 
2.^ re J 

plained earlier, the map dedson rule will then only coindde with the mse dedson rule if, ad- 
ditionally, the pdf is constant. 

The degree of freedom to select the reference quantiser for the map dedsion rule can be exploit- 
ed by specifying a characteristic Q2 that results in a sophisticated rate-distortion perform- 
ance. Thus, the map decision rule can take into accoimt not only the mse but also the resulting 
bit amount which is more suitable in a rate-distortion sense compared with the mse decision 
rule. 

As the mse decision rule of eq. (34) minimises the cost function of eq. (18), one may now con- 
sider what cost function is minimised by the map dedsion rule of eq. (38). It is not difficult to 
verify that the map decision rule minimises the cost function 

where and are defined as in eqs. (13) and (14), respectively. Interestingly, it is again 

a mean-squared-error that we end up with. However, the mse of eq. (39) is more suitable in a 
rate-distortion sense than the mse of eq, (18). 

Finally in this section, we note that both the mse and the map cost function belong to the family 
of Bayesian cost functions {Melsa-Cohn]. 
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5. A parametric model to describe the statistics of the dct-coefficicnts 

The transcoder needs to know the pdf pix) of the original dct-coeffidents x in order to apply 
both the mse and the map cost functions. Basically, there are two possible approaches. In the 
first case, p(x) is transmitted as additional side information along with the first generation bit 
stream. Second, no additional side information is transmitted, and p(x) is estimated from the 
first generation dct-coeffidents y^.A parametric model is required that involves only a few 
parameters to limit the amotmt of additional side information in the first case. Also in the sec- 
ond case a parametric model with only a few parameters is needed because the reconstructed 
coeffidents may not cover a sufficient amplitude range and/or their number may not be 
sufiident to achieve a reliable estimate for many parameters; this problem is known as 'contact 
dilution' [Rissanen et al. - 96). In the following steps, a common parametric model will be de- 
rived that is suitable for both cases. 

In the first step, we propose to model p(jc) as a Lapladan-like pdf 

which can be described by a single positive parameter a . A priori, the AC-coeffidents of an 
8x8 block cannot be assumed to share the same distribution. Therefore, an individual a value 
is spedfied for each AC-frequency index, resulting in 63 parameters in total. 

Due to the discrete nature of the dct-coefficients, the probability Pj^ to encounter a represen- 
tation level >j = Vj ^ = / . is given by eq. (22). For the special case of = I , Pj^ can be 
considered the probability of the original dct-coeffcients x . A histogram for ten consecutive 
frames of the CCIR 601 test signal 'mobile' has been computed in order to check whether the 
parametric model of (40) is suitable for the original dct-<:oefficients x . Due to the s)nmmetry as- 
sumption for positive and negative ampUtudes in eq. (40), the relative frcquendes ^ of abso- 
lute amplitude levels \A = I have been measured, where 0 ^ / ^ 1024 and /V = 10 x 6480 for 
each AC frequency index. The results are similar for all 63 AC frequendes, three examples for 
selected horiziontal and vertical frequency indices are shown in Fig. 11. Each curve has its 
maximum at IjcI = / = 1 and decays rapidly with increasing /. The maximum at W = / = 1 
is larger and sharper peaked for higher frequencies, indicating a decreasing variance for in- 
creasing frequendes. Similar results have been obtained for other test signals. Qualitatively, 
the type of curve shown in Fig. 11 can be generated with eq.(40) by appropriate adjustment of 
the parameter a . 
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In order to get the best parameter fit one can take advantage of the fact that the transcoder 
needs to know the pdf pix) only for x values that are mapped onto non-zero representation 
levels y^^Ohy the first generation quantiser. This is because the first generation coefficients 
^ 0 are always mapped onto the second generation coefficients ^2 " ^ - Hence, ttie pdf 
has to be modelled only for \x\^d^^, where ^ is the smaUest positive first generation deci- 
sion level. 




^ 12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 



Fig. 11: Histogram for absolute amplitude levels of original dct-coeffidents 

for different horizontal and vertical frequency indices, test signal 'mobile' 



For the parametric description of eq. (5), one obtains <f J J = ^1 -J. j • ^, . In order to become 
independenj^ of the first generation quantisation step-size the value ot d^y can be set to 
J, -fl V\,Q . where fl . is the smallest possible step-size that complies with eq. (2). 

11 I 2 7 ^mtn ^mi/i 

Therefore, the pdf is modelled as 
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The parameter p in eq. (41) takes into account that the integral jp{x)dx is smaller than one 
when calculated only over the range W > j p The value of P can be used to match the condi- 
tion 

J . p{x)dx^ \- J p(x)dx^ 1~ S ^'=l-/o- 

The right hand side of (42) is determined by the measured relative frequencies jf of absolute 
values Ul = I that are commonly mapped onto y J = 0 by the first generation quantiser. The 
latter is indicated in eq. (42) by the index /q which depends on the decision level rfj ^ Thus, 
instead of calculating the sum 



/</q 

that appears in the right hand side of (42) one can ilso measure the relative frequency /q of 
the event y^ .-O . ^ . 

It is important to notice that P .is not needed to apply the parametric model of eq. (41) to both 
the mse and the map cost function. During the calculation of the local centroids with eqs. (22) 
and (23), the value of p is cancelled out as it appears in the top and in the bottom line in eq. 
(23). Cancellation also occurs for the map cost function during the calculation of the a-posteri- 
ori probabilities of eqs. (36) and (37). Hence, only tiie value of a is needed in the transcoder. 
As a consequence, eq. (41) can provide a better fit for the curves shown in Fig. 11 tiian eq. (40), 
resulting in a more accurate estimate of tiie parameter a witii no increase of side information. 

The probabilities P ^ ^ can be evaluated for the parametric description of the decision levels of 
eq. (5) by inserting the parametric model of eq. (41) in eq; (22). After a straightforward calcu- 
lation that also takes into account condition (42) one. obtains 

We recall that eq. (43) specifies the probability that one observes a first generation coefficient 
yj=rj^ = /- gj.Itis interesting to notice that the first generation quantiser parameter X| is 
not needed in eq. (43) but only the step-size q ^ . Due to the symmetry assumption of the para- 
metric model (41) one gets P^ = PiH) as a outcome of (43). For / = 0 the probability is set 
toPjo = /o- The largest index in (43), i.e.L, can be set to the largest possible value L = 1024 
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for AC-coefficients or it may be set more accurately to the largest value that is actually encoun- 
tered in the transcoder. While eq. (43) forms tiie basis to estimate the parameter a from the 
first generation coefficients y ^ , we need a corresponding equation to estimate a from the orig- 
inal coefficients x . As already mentioned, the original dct-coeffidents result as a special case 
for the step-size 9i = 1 , i.e. x = yj = r^^ = /• 1 . When combined with condition (42), 
eq.(43) has to be modified as follows, 

1-2 

As in the previous case, the probabilities for indices in the range |fl < /q that are not defined by 
eq. (44) can be set to the value of the corresponding relative freqtiendes of the original dct-co- 
effidents X. 

We first concentrate on estimating the parameter a according with eq. (43). Ideally, the model 
probabilities ^ j / of eq. (43) should coincide witti tiie relative frequencies of the first gener- 
ation coefficients y j = r j ^ = / • that can be measured in the transcoder. However, there is 
only one free parameter in the model, Le. a . Therefore, /> j , = is in general not achievable 
for each index / . A coding argument can be used as a guideline for adjusting a . If the para- 
metric model of (43) were applied to encode the first generation coefficients , then the min- 
imum average codeword length cwl would be 

cwl is given in the unit 'bit per coefficient' if the logarithm is taken relattve to the base of 2 in 
eq. (45). The right hand side of (45) specifies the (first order) source entropy ent which is the 
lower bound for cw/ thatcan orUy be reached if Pj^ = holds, throughout. The goal is now 
to adjust the parameter a of the probabiUties P j ^ such that the resulting cwl is minimised and 
is as dose as possible to the source entropy ent . It is hirtfier interesting to notice that cwl can 
also be written as 

cwl = X-// • iogi»i, = — • logPCyii. 

where /"(y 1 1 • > 1 2 ^XN^ spedfies the joint probability that results from the parametric mod- 

d (43) for all first generation coeffidents arranged in some scan order yj j. ^lyv • ^ 

a consequence of (46), the minimisation of cwl by adjusting a coinddes with the maximisa- 
tion of f(yii. yj2. — J-i/;) the observed coeffidents y^^, y^^' ^lAr - "^"^ ^« 
fl\e parameter a is determined by a maximum likelihood (ML) estimation. 
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The ML-estimate for a can be calculated with r=^w/ = 0 . In order to simplify this calcula- 

da 

* tion, eq. (46) is evaluated by inserting the parametric model for ^ ^ / that results when the larg- 
est index L in eq. (43) tends to infinity. This is justified by the fact that in practice L is very 
large, e.g. L = 1024 . For L -> « , after differentiating and re-ordering one gets the equation 



-(a-9j) 

In eq. (47) the value of a is indirectly given by z = e / see also eq. (43), and / speci- 

fies the measured average value of the absolute first generation indices |/| / i.e. 

From eq. (47) one obtains the ML-estimate for a, 

-(a-^i) l-/n 
^^^^ ^1=1 12, (49) 



Note that only the mean value / of eq. (48) and the relative frequenq' /q of the event y j = 0 
have to be measured from the first generation coefficients to detem^ne the ML-estimate of (49). 

One obtains a corresponding residt if the parameter a is not estimated from the first genera- 
tion coefficients y j but from the original dct-coeffidents x . In this case eq. (44) instead of eq. 
(43) is used to derive the ML-estimate 



/-(i-/o)(^o-^> 



(SO) 



where the mean value / is given by 

i 

As introduced earlier the ratios ^' in eq. (51) specify the relative frequencies of the abolute val- 
ues |x| = i , see also Fig. 11. The ML estimation of eqs. (50) and (51) coincides with the ML es- 
timation of eqs. (48) and (49) in the special case of 9, = /q = 1 . In general, one obtains 
different estimation values for the parameter a As more information is provided by the orig- 



' N' 



(51) 
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inal dct-coeffidents, the ML-estimation with (50) and (51) is more acoarate, however, the pa- 
rameter a has then to be signalled as additional side information. Li this case it may be more 
convenient to signal z = c rather than a because the z -values have a normalised ampli- 
tude range, i.e. 0 < z < 1 , so that each z -value can be rounded to a fractional binary number of 
e.g. 8 bit length. In contrast to (50) and (51), the ML-estimation with (48) and (49) requires no 
. additional side informatioh that has to be sent to the transcoder. 

As an example, the parameter a has been estimated for each curve of Fig. 11 from the original 
dct-coeffidents x by applying eqs. (50) and (51). In order to determine the value of Iq the de- 
cision level d^^ has been set according to 

resulting in 

where the function [a] rounds the given argxmient a up to the nearest integer. The resulting 
ML-estimate of z = e"^ can be used for all qscale^-vdlMSS that are larger or equal to 
qscale^^^ = 2 . According to eq. (53) the value of /q depends only on the visual weight Wj 
that changes with the horizontal and vertical frequency index of the AC-coeffidents. The 
weighting matrix of MPEG-2 test model TM5 has been used. Additionally, the estimated z -val- 
ues have been rounded to fractional binary numbers of 8 bit length. Figs. 12a - 12c show the 
resulting model probabilities of eq. (44) in comparison to the measured histograms of Fig. 11. 
Note that in Figs. 12a-12c the model probabilities for the absolute amplitude levels are shown, 
= Pu-^PM-i)iox\l\>l, 
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a.) horiz. freq. 1, vert. freq. 1; Wi=16 
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Fig. 12 a: Model probabilities acc to eq. (44) (solid line) and histogram (dashed line) 
for absolute amplitude levels of original dct-coe£fidents, test signal 'mobile' 
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Fig. 12 b: Model probabilities acc to eq. (44) (solid line) and histogram (dashed line) 
for absolute amplitude levels of original dct<oeffidents, test signal 'mobile' 
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Fig. 12 c: Model probabilities acc. to eq. (44) (solid line) and histogram (deished line) 
for absolute amplitude levels of original dct-coiefiEidents, test signal 'mobile' 



6. Evaluation of the mse and the map cost function for the parametric model 

In this section both the mse and the map cost function axe evaluated for the parametric model 
of the pdf p{x) of eq. (41). The model parameter a can be estimated as described in the previ- 
ous section. In order to ease the comparison between the mse and the map cost function, the 
resulting decision rules of eq. (34) and (38) are considered, respectively. 

Firstly, the mse decision rule of eq. (34) is evaluated. The pdf p(x) determines the parameter 
J ^ that is related to the local centroid c j ^ according to eq. (30). The local centroid c ^ ^ can be 
calculated for the decision levels of eqs. (25) and (26) by inserting the parametric model of eq. 
(41) in eqs. (22) and (23). Without loss of generality a positive index / > 1 can be considered, 
the result can be mirrored for the corresponding negative index -/ . The calctdus then yields 



1-c 



(54) 
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^1/ 



From eq. (30) one gets c^^ = + comparison with eq. (54) 



1 -e 



-(a ^j) 



(55) 



It follows from eq. (55) that the local centroid parameter ^ does not depend on the index / 
that is selected in the transcoder by the actual first generation coefficient yi=''i; = ''^i' 
rather p.^^ depends only on the estimated parameter a and the step-size that is decoded 
from the first generation bit stream. HencC/ the Lapladan-like pdf defined by the parametric 
model of eq. (41) is another case that results in a constant parameter p.^ ^ apart from tt\e special 
cases already discussed in the last paragraph of Section 4.1.1. Consequently, the parametric 
model of eq. (41) also reduces the complexity for implementing the mse decision rule in the 
transcoder. 

Secondly, the parametric model is evaluated for the map decision rule of eq. (38). After the 
computation of the a-posteriori probabilities of eqs. (36) and (37) for the decision levels of eqs. 
(25), (26) and the model pdf of eq. (41), the map decision rule of eq. (38) can be rewritten in a 
similar form to the mse decision rule of eq. (34), 



'2(L-1) 
'2L 



The map-threshold in eq. (56) is given by 



1 



In 



if 



2 

^1 . ^ 



^1 +e 



-(a-^j) 



(56) 



(57) 



where In (a) returns the natural logarithm, i.e. logarithm relative to the base e, of the argu- 
ment a. 
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Thus we see that the map decision rule can be implemented in essentially the same way as ttie 
mse decision rule for the parametric model of eq. (41). The main differences are that the mse- 
threshold is defined by eq. (55) but the map-threshold by eq. (57), and that the dedsion levels 
dr ^ are fixed in the mse case but can vary depending on the reference quantiser ref ^ 
the map case as discussed earlier in Section 4-2. 

The same estimated parameter a can be applied to both the mse and the map cost functions. 
Therefore, the transcoder has the option to switch locally e.g. on a 8x8 block basis, between the 
mse and the map cost function with no further ixicrease of additional side information. 

7. Experimental results 

In order to verify the theoretical results derived in the previous sections, the branscoding set 
up of Figs. 3 and 4 has been simulated for ten consecutive frames of the CCIR-601 (CCIR-601] 
formatted test signal 'mobile'. As the MPEG-2 test model TM5 [TM5-93] is a widely acknowl- 
edged reference for a MPEG-2 standalone encoder, the first generation quantiser Q j has been 
fixed to the TM5 quantiser characteristic throughout in the experiments. 

The transcoder's quantiser characteristic Q2 has been set to TM5 in the first experiment. As an 
example, the corresponding qscale value has been fixed to qscale2 = 32 , which corresponds 
approximately to the adjustment in I-frames that is used for a 3 Mbit/s simulation including 
P- and B-frames. According to the linear qscale table of MPEG-2, the qscale value of the TM5- 
quantiser used in the first generation encoder has been varied in the range 
qscale^ = 2. 4. 6. 8, 32. Thus, transcoding is simulated for different first generation bit 
rates and a fixed target bit rate for the second generation. The resulting PSNR values for the 
image signal reconstructed from the second generation bit stream are shown in Fig. 13a 
as a function of qscale ^ . For reference, the solid Une shows the resulting PSNR value of a stan- 
dalone TM5-encoder that by-passes the first generation and directly encodes the original sig- 
nal with a qscale value of 32. The Peak-Signal-to-Noise-Ratio is related to the mean-squared- 
crror between the original and the second generation sigrud as PSyV/? = 10logio[255 /mse]. 
In addition, Fig 13b conveys the bit amount needed to encode the second generation AC-coef- 
fiderits with the MPEG-2 intra vie table. 
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Fig. 13a: PSNR of 2nd generation signal as a function of qscalci for fixed qscale2 = 32. 
TM5, lest signal ^mobile' 
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Fig. 13b: AC-bit amount of 2nd generation signal as a function of qscalcj for fixed qscalcj = 32, 
TM5, test signal 'mobile' 

The PSNR values drop dramatically by more than 1 dB for medium values of qscalej in the 
range of 16-20 when compared to the reference. At the same time, the resulting bit amount 
changes considerably. In this case the resulting ratio qscale2/qscale^ = Ri^Qx is around 1, 
which is unfavourable for transcoding as explained during the discussion of eq. (17) in Section 
3. As an example, for qscale^ = 20, the PSNR value is about 15 dB below the reference line whOe 
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the resulting bit amount is approximately (within 3%) the same for the reference and the trans- 
coded signal, see Figs. 13a, 13b. When qscalei is increased to 22, the PSNR value recovers and 
is only 0.1 dB below the reference line, however, now the bit amount dramatically exceeds the 
reference line by about 29%. These considerable changes between consecutive qscalei values 
would have the most unpleasant effect on any rate control scheme that is used in the transcod- 
er, thus resulting in a poor picture quality for the second generation. The PSNR value recovers 
when the qscalei value approaches qscalei, resulting bit amoimt is rather high. For 

qscalei = qscalei = 32, transparency is achieved, i.e. the first and the second generation bit stream 
are identical. The TM5-quantiser provides the best results for small valixes of qscale^ in the 
range 2-8. Clearly, the first generation dct-coeffxdents contain less quantisatibn noise when 
qscalei is small, and as a consequence, the difference between the second generation and the 
reference signal also becomes smalL Similar results have been obtained for other test signals 
(OW-BBC-l]. 

In addition to the TM5 quantiser characteristic, the transcoder has the option to apply the mse 
or the map cost function. The decision to select either of them may be based on the rate-distor- 
tion performance. While the resulting PSNR/mse-values can be estimated and compared in 
the transcoder by using the parametric model of eq. (41) for the pdf p{x) , it wotild be rather 
complex if the resulting bit amount had to be computed for each cost function by forming the 
MPEG-2 compatible two-dimensional (runkngth/cmplitudehevents of each 8x8 block and by 
looking up each codeword length from the intra vie table. It is less complex for the trar\scoder 
to calculate the first order source entropy, as in eq. (45), because only a histogram coimt for the 
second generation coefficients is then needed. Fig. 14a shows the bit amount for the MPEG-2 
intra vie of Fig. 13b scaled down for 8x8 blocks in comparison to the source entropy summed 
up for all 63 AC-coeffidents. It can be seen that the two curves shown in Fig. 14a have essen- 
tially the same shape. This is confirmed if one calculates the ratio between them for each qscalei 
value, see Fig 14b. The ratio is almost constant, i.e. 0.84. Thus, a comparison of the mse and the 
map cost fimction and the TM5 performance in terms of bit rate can also be carried out on the 
basis of the source entropy rather than by applying the MPEG-2 intra vie. Another interesting 
result of Fig. 14b is that up to approx. 16% of the bit rate could be saved by using the AC-en- 
tropy source model instead of the MPEG two-dimensional (runlength^mplitude) model. How- 
ever, this result cannot be exploited for MPEG-2 compatible bit streams. 
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Fig. 14 a: Bit amount of 2nd generation AC coefficients: MPEG-2 Intra VLC vs. 
first order source entropy, test signal 'mobile' 
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Fig. 14 b: Bit amount ratio: First order source entropy over MPEG-2 Intra VLC 
(soUd line), ratio =0.84 (dashed line), test signal 'mobUe' 
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Fig. 15a: PSNR of 2nd generation signal as a function of qscalei for fixed qscale2 = 32; 

inse(upper line), map/^^^ 0.90(iniddle line), TM5(dashed line), test signal ^mobile* 



AC-entropy per 8x8 block of 2nd generation [bit] 



60 
55 








mse - — 
map 

tmS — , 


50 










45 










40 










35 










30 






1 1 

1 1 




25 






1 1 

1 1 





^ qscalei 



1 - ■ ■ ' ' ■ . ^ » ' 

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 



Fig. 15b: AC-entropy of 2nd generation signal as a fiinction of qscalej for fixed qscale2 - 32; 

msc(uppcr line). mapAj^f = 0.90(nuddle line). TMS(dashed line), test signal 'raobUe' 
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The mse and the map cost function have been simulated in the next experiment. The reference 
quantiser Q2 ^ef "^P function complies to eqs. (3) and (4), as an example the ded- 

son level parameter has been set to = 0.90 . The pdf parameter a has been estimated 

for each AC-frequency index as described in the example given in Section 5, see Figs. 12a-12c, 
with respect to the original dct-coefficients, resulting in 63 x 8 = 504 bit additional side infor- 
mation. We recall that these a -values do not depend on the actual ^502/^2 value as they are es- 
timated according to eqs. (52) and (53). Thus, the additional side information of 504 bit is the 
S2une for all first generation bit streams. A more accurate estimation of a is possible if the ac- 
tual qscoi^i value is inserted in eq. (52) instead of 95Cfl/e„,„ = 2; however, the additional side 
information would then depend on the first generation encoding process. 

A comparison of the cases where either the TM5 quantiser, the mse or the map cost function is 
used in the transcoder is shown in Figs. 15a and 15b. The mse cost function achieves the largest 
PSNR values, see Fig. 15a. For example in the case of qscalei = 20, the PSNR value can be in- 
creased by approx. 1.1 dB from approx. 25.6 dB for the TM5-quantiser to approx. 26.7 dB for 
the mse cost function. However, the mse cost function comes at a price as the entropy of the 
AC-coeffidents is significantly increased, see Fig. 15b. The mse cost function may therefore not 
be applicable throughout. However, the mse cost function can be used locally to transcode 
blocks with critical image content. As a further option, the mse cost function can only be ap- 
plied to AC^oeffidents wifl\ low frequency index because the human visual system is in gen- 
eral more sensitive to quantisation noise that is added to low frequendes. 

The map cost hmction is more suitable in a rate-distortion sense compared with the mse cost 
function. For the critical case of qscale j = 20 , the AC-entropy is approximately 4.7% smaller 
and at the same time the PSNR value is approximately 0.4 dB larger in comparison to TM5. 
Figs. 15a and 15b show just one example for the map cost function. As the map cost function 
is governed by the parameter \ref'^ rate-distortion characteristics can be generated 
by varying thus allowing a smooth transition between the TM5-quantiser perform- 

ance and the i^e cost function in terms of PSNR and corresponding bit rate. The results for 
different values of \ in Figs. 16a and 16b show a monotonic behaviour: when is 
increased, both the PSNR value and the AC-entropy are increased, too. 
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Fig. 16a: PSNR of 2nd generation signal as a function of qscalci for fixed qscale2 = 32; 
map cost function for different values of X2^f , test signal * mobile' 
X2,ref = 0.95 (upper line). 0.90, 0.85, 0.80, 0.75 (lower line) 
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Fig. I6b: AC-entropy of 2nd generation signal as a function of qscalei for fixed qscale2 = 32; 
map cost function for different values of ^^i^i , test signal 'mobile* 
^2 rcf = 0-^5 ("PP^*" ^-^^^ (lower line) 
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In the last experiment the impact of estimating the pdf parameter a not from the original dct- 
coefficients but from the first generation coefficients is investigated. No additional side infor- 
mation is necessary if a is estimated from the first generation coeffdents. A comparison be- 
tween the two methods is shown in Figs. 17a and 17b for the mse cost function. While the 
PSNR/mse perfonnance in Fig. 17a is almost identical with and without additional side infor- 
mation, one can see from Fig. 17b that the resulting bit amount is significandy higher in some 
cases without side information. Thus, not very surprisingly, the results are in favour of send- 
ing additional side information. This conclusion is not necessarily true for the map cost func- 
tion as shown for the example of X^^ = 0.90 in Figs. 18a and 18b. Similar to the mse cost 
function, the AC-entropy is significantly higher for some qscaki values when no side informa- 
tion is sent (Fig. 18b), however, die resulting FSNR values also exceed the corresponding PSNR 
values for the case of additional side information. Thus, a good performance can be achieved 
even when no additional side information is sent to the transcoder. 
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Fig. 17a: PSNR of 2nd generation signal as a function of qscalci for fixed qscale2 = 32; 

mse cost function with and without additional side information, test signal 'mobUe' 
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Fig, 17b: AC-entropy of 2nd generation signal as a function of qscalci for fixed qscale2 = 32; 
mse cost function with and without additional side information, test signal 'mobile' 
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Fig. 18a: PSNR of 2nd generation signal as a function of qscale] for fixed qscale2 = 32; 
map cost function with and without additional side information, ^ref^^-^* 
test signal 'mobile* 
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Fig. 18b: AC-entropy of 2nd generation signal as a function of qscale j for fixed qscale2 = 32; 
map cost function with and without additional side information, ^2^^ = 0.90, 
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8. Conclusions and future work 

This paper discusses transcoding of MPEG-2 intra frames. A theoretical analysis of the trans- 
coding problem is carried out with emphasis on designing the quantiser characteristic of the 
transcoder. The comparison with a reference quantiser of a standalone encoder shows that 
transcoding only results in an equivalent overall quantiser characteristic under special condi- 
tions. This is indicated by the ratio of the first and the second generation quantisation step-siz- 
es. Only for certain ratios is the mapping of the original dct-coeffidents onto the second 
generation coefficients the same for the reference quantiser and the overall quantiser charac- 
teristic that results from transcoding. In general for an arbitrary ratio, ttie standalone reference 
quantiser characteristic cannot be achieved during transcoding. As a consequence, there is a 
loss due to transcoding when the mean-squared-error (mse) or the rate-distortion performance 
is considered. In order to minimise this loss, the degree of freedom that lies in selecting.the de- 
cision levels for an MFEG-2 compatible quantiser can be exploited. 

Two-approaches for the adjustment of the decision levels in the transcoder are proposed, ex- 
plained and compared. Firstly, the minimisation of the mse cost function is taken as a guideline 
for adjusting the decision levels. The objective of the mse cost function is to give tihe smallest 
mse values between the original and the second generation coefficients. The resulting mse de- 
cision rule can be implemented in the transcoder in essentially the same way as the first gen- 
eration quantiser characteristic. However, the mse cost function comes at a price as no 
attention is paid to the bit rate needed to encode the second generation coefficients. Therefore, 
the maximum a-posteriori (map) cost function is additionally introduced. The map cost func- 
tion is more suitable in a rate-distortion sense than the mse cost function. While the standalone 
reference quantiser is fixed for the mse cost function, the map cost function has the additional 
freedom to choose the standalone reference quantiser. Thus, by changing the reference quan- 
tiser, a set of rate-distortion characteristics can be generated with the map cost function. Inter- 
estingly, the map cost function is again given by a mean-squared-error (mse); however in 
contrast to the mse cost function, the mse between the output of the standalone reference quan- 
tiser and the second generation coefficients is minimised. 

The statistical distribution of the original dct-coefficients is needed to apply both the mse and 
the map cost function in the transcoder. A parametric model based on a Lapladan probability 
density function (pdf) is proposed. One parameter for each frequency component allows ad- 
aptation to the actual signal statistic. This parameter can be estimated either in the transcoder 
from the first generation coefficients or from the original dct-coefficients. In general, the latter 
results in a more accurate estimate, however, the parameter has then to be transmitted as ad- 
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didonal side information along with the first generation bit stream. For bodi cases, a maximiun 
likelihood estimation rule is derived. The parametric model is validated witti real image data; 
experimental results confirm that the model pdf is suitable for describing the distribution of 
the dct-coefficients. 

The mse and the map cost function are evaluated for the parametric model. It is shown that the 
proposed model pdf also simplifies the implementation, and that the restdting mse and map 
decision rules can then be stated in very similar forms. 

Experimental results confirm the effectiveness of the mse and the map cost function. For refer- 
ence, the quantiser characteristic of the MFEG-2 reference encoder TM5 rrM5-93] has also been 
used-in the transcoder. The results show large changes in terms of PSNR values and bit rate of 
the second generation coefficients for a ratio around two of the second and first generation 
quantisation step-sizes. The PSNR value can drop by about 1.5 dB while the bit rate remains 
constant or, conversely, the PSNR value remains rather constant while the bit rate is increased 
by almost 30%. This causes problems for a rate controller that is used in the transcoder. The 
mse cost function achieves the largest PSNR values, resulting in up to 1.1 dB gain compared to 
TM5. However, the ix\se cost function also leads to the largest bit rates and may therefore only 
be applied locally to blocks with critical image content. As a further option, the mse cost func- 
tion can only be applied to AC-<:oeffidents with low frequency indices because the human vis- 
ual system is in general more sensitive to quantisation noise that is added to low frequencies. 

Experimental results show that in comparison to TM5, tiie map cost function can lead to a 
smaller bit rate (4.7 %) and at the same time to a larger PSNR value (0.4 dB) in critical cases, 
thus resulting in a better rate-distortion performance. By changing the reference quantiser of 
the map cost function, a set of rate-distortion characteristics can be generated allowing a 
smooth transition between the rate-distortion performance of the mse cost function and TM5. 
In the experiments, the class of reference quantisers is spanned by a single parameter; results 
show a monotonic behaviour in tiiat an increase of the reference quantiser parameter leads to 
a larger PSNR value and to a larger bit rate. 

Experimental results confirm tiiat for tiie mse cost function tiie estimation of the pdf model pa- 
rameter from tiie original dct-coef fidents leads to a better rate-distortion performance tiian es- 
timating the model parameter from the first generation coeffidents. This is not necessarily true 
for the map cost function, as revealed in one example. Thus, a good transcoding performance 
can also be achieved when no additional side information is transmitted along witii the first 
generation bit stream. 
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It is further shown that the first order source entropy of the second generation coefficients can 
be used to derive an estimate of the bit rate that results from the MPEG-2 intra vie codeword 
table. This would simplify the computation of the bit rate if the transcoder had to dedde upon 
either theTMS, the mse or the map cost function based on the best rate-distortion performance. 
The resulting PShlR values can be compared in the transcoder on the basis of the Laplacian 
model pdf. This could be simplified for the map cost function due to the monotonic behaviour 
of the rate-distortion performance, e.g. after setting a target bit rate on a frame or block basis, 
the parameter of the reference quantiser can be increased until the first.order source entropy 
exeeds the target bit rate. The investigation of an 'easy-to-implement' algorithm based on tf\e 
above rate-distortion considerations is a promising goal of future work. Furthermore, the pre-, 
sented results can be adapted for transcoding of MPEG-2 inter-firames, i.e. P- and B-frames, in- 
volving motion compensatLcig prediction. However, the problem of drift [OW-94] [OW-96] 
between the predictors of the encoder and the decoder has then additionally to be taken into 
account. 
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CLAIMS 

1 . A method for compression encoding of a digital signal, including the 
steps of conducting a transformation process to generate values and 

' quantising the values through partitioning the amplitude range of a 
value into a set of adjacent intervals, whereby each interval is mapped 
onto a respective one of a set of representation levels which are to be 
variable length coded, such that a bound of each interval is controlled 
by a parameter X, characterised in that X is controlled so as to vary 
dynamically the bound of each interval with respect to the associated 
- representation level. 

2. A method according to Claim 1 , wherein each value is arithmetically 
combined with X. 

3. A method according to Claim 1or Claim 2, wherein X is a function of 
the quantity represented by the value. 

A method according to Claim 3, wherein the transformation is a DCT 
and A. is a function of horizontal and vertical frequency. 

). A method according to any one of the preceding claims, wherein A. is a 
function of the quantisation step size. 

i. A method according to any one of the preceding claims, wherein X is a 
function of the value amplitude. 

A method according to any one of the preceding claims, wherein the 
digital signal to be encoded has been subjected to previous encoding 
and decoding processes and X is controlled as a function of a 
parameter in said previous encoding and decoding processes. 
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8. A method according to Claim 7, having quantisation step size q = 

and a value of X = X2. in which the value to be quantised has previously 
been quantised using a quantisation step size q = qi and a value of X 
= Xi, wherein X is a function of qi and Xi . 

9. A method according to Claim 7 or Claim 8, wherein X is a function of 
Xref . where A,ref is the value of X that would have been selected in a 
method according to Claim 1 operating with a quantisation step size 
q = q2 upon the value prior to quantisation with the quantisation step 
sizeqsqi. 

1 0. A method according to any one of the preceding claims, wherein the 
quantisation step size q is independent of the input value, othenwise 
than for the zero quantisation level. 

11. A (q, X) quantiser operating on a set of transfonn coefficients Xk 
representative of respective frequency indices fk in which k is 
dynamically controlled in dependence upon the values of Xk and fk 

12. A quantiser according to Claim 1 1 , wherein the parameters fk are 
frequency indices. 

13. A quantiser according to Claim 1 1 or Claim 12. in which X is 
dynamically controlled to minimise a cost function D + fiH where D is a 
measure of the distortion introduced by the quantisation in the 
uncompressed domain and H is a measure of compressed bit rate and 
|j. is a constant detennined by the bit rate constraint. 
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In a compression transcoder, operating on a compressed signal 
quantised in first (qi, Xi)-type quantiser, a second Yq^, X^^-type 
quantiser in which the second generation X2 value is controlled as a 
function: 



15. A compression transcoder according to Claim 14 in which the 
parameter Xz^ret, represents a notional reference (q^J^^^re^) -type 
quantiser which bypasses the first generation coding and directly maps 
an original amplitude onto a second generation reference amplitude. 

16. A compression transcoder according to Claim 14 in which the 
parameter X2,ref is selected empirically. 

17. A compression transcoder according to Claim 16 in which the 
parameter Xz^ret is fixed for each frequency. 
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